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Abstract 

Background:  Human  genetics  and  host-associated  microbial  communities  have  been  associated  independently 
with  a  wide  range  of  chronic  diseases.  One  of  the  strongest  associations  in  each  case  is  inflammatory  bowel  disease 
(IBD),  but  disease  risk  cannot  be  explained  fully  by  either  factor  individually.  Recent  findings  point  to  interactions 
between  host  genetics  and  microbial  exposures  as  important  contributors  to  disease  risk  in  IBD.  These  include 
evidence  of  the  partial  heritability  of  the  gut  microbiota  and  the  conferral  of  gut  mucosal  inflammation  by 
microbiome  transplant  even  when  the  dysbiosis  was  initially  genetically  derived.  Although  there  have  been  several 
tests  for  association  of  individual  genetic  loci  with  bacterial  taxa,  there  has  been  no  direct  comparison  of  complex 
genome-microbiome  associations  in  large  cohorts  of  patients  with  an  immunity-related  disease. 

Methods:  We  obtained  16S  ribosomal  RNA  (rRNA)  gene  sequences  from  intestinal  biopsies  as  well  as  host 
genotype  via  Immunochip  in  three  independent  cohorts  totaling  474  individuals.  We  tested  for  correlation  between 
relative  abundance  of  bacterial  taxa  and  number  of  minor  alleles  at  known  IBD  risk  loci,  including  fine  mapping  of 
multiple  risk  alleles  in  the  Nucleotide-binding  oligomerization  domain-containing  protein  2  ( NOD2 )  gene  exon.  We 
identified  host  polymorphisms  whose  associations  with  bacterial  taxa  were  conserved  across  two  or  more  cohorts, 
and  we  tested  related  genes  for  enrichment  of  host  functional  pathways. 

Results:  We  identified  and  confirmed  in  two  cohorts  a  significant  association  between  NOD2  risk  allele  count  and 
increased  relative  abundance  of  Enterobacteriaceae,  with  directionality  of  the  effect  conserved  in  the  third  cohort. 
Forty-eight  additional  IBD-related  SNPs  have  directionality  of  their  associations  with  bacterial  taxa  significantly 
conserved  across  two  or  three  cohorts,  implicating  genes  enriched  for  regulation  of  innate  immune  response,  the 
JAK-STAT  cascade,  and  other  immunity-related  pathways. 

Conclusions:  These  results  suggest  complex  interactions  between  genetically  altered  host  functional  pathways  and 
the  structure  of  the  microbiome.  Our  findings  demonstrate  the  ability  to  uncover  novel  associations  from  paired 
genome-microbiome  data,  and  they  suggest  a  complex  link  between  host  genetics  and  microbial  dysbiosis  in 
subjects  with  IBD  across  independent  cohorts. 
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Background 

Crohns  disease  (CD)  and  ulcerative  colitis  (UC),  collect¬ 
ively  known  as  inflammatory  bowel  disease  (IBD),  have 
long  been  known  to  have  genetic  risk  factors  due  to  in¬ 
creased  prevalence  in  relatives  of  affected  individuals  as 
well  as  higher  concordance  rates  for  disease  among  mono¬ 
zygotic  versus  dizygotic  twins.  The  sequencing  of  the  hu¬ 
man  genome  and  subsequent  large-cohort  genetic  studies 
has  revealed  a  complex  set  of  polymorphisms  conferring 
varying  levels  of  risk.  Extensive  analyses  of  these  loci  re¬ 
vealed  that  impaired  handling  of  commensal  microbes 
and  pathogens  is  a  prominent  factor  in  disease  develop¬ 
ment  [1].  For  example,  genetically  driven  impaired  func¬ 
tion  of  NOD2  in  the  sensing  of  bacterial  products  like 
lipopolysaccharide  may  cause  an  increase  in  bacteria  that 
produce  those  products.  Involvement  of  the  JAK-STAT 
pathway  in  immune  responses,  and  involvement  of  the  IL- 
23-Thl7  pathway  in  microbial  defense  mechanisms,  are 
also  possible  links  between  impaired  immune  response 
and  imbalances  in  bacterial  assemblage  [1-3].  These  gen¬ 
etic  findings  are  in  line  with  separate,  independent  tests  of 
microbial  shifts  associated  with  IBD.  Shifts  in  taxonomic 
composition  and  metabolic  capabilities  of  the  IBD  micro- 
biome  are  both  now  beginning  to  be  defined  [4-9].  Deter¬ 
mining  the  extent  and  nature  of  host  genome-microbiome 
associations  in  IBD  is  an  important  next  step  in  under¬ 
standing  the  mechanisms  of  pathogenesis.  Despite  the 
documented  independent  associations  of  IBD  with  herit¬ 
able  host  immune  deficiencies  and  with  microbial  shifts, 
there  has  been  limited  study  of  the  co-association  of  com¬ 
plex  host  genetic  factors  with  microbial  composition  and 
metabolism  in  IBD  patients  or  other  populations  [9-17], 
and  the  mechanisms  of  host-microbiome  disease  pathways 
are  largely  unknown. 

Using  three  independent  cohorts  comprising  474  adult 
human  subjects  with  IBD  aged  18  to  75  years,  we  tested 
known  IBD-associated  host  genetic  loci  for  enrichment  of 
association  with  gut  microbiome  taxonomic  composition. 
Cohorts  were  located  near  Boston  (USA),  Toronto  (Canada), 
and  Groningen  (the  Netherlands),  with  152,  160,  and  162 
subjects,  respectively.  The  cohorts  contained  62.5%,  14.3%, 
and  63.5%  CD  cases  with  the  remainder  cases  of  UC,  and 
31.5%,  11.3%,  and  53.1%  biopsies  from  inflamed  sites,  re¬ 
spectively  (detailed  summary  statistics  by  cohort  and  biopsy 
location  in  Figures  SI  and  S2  in  Additional  file  1).  The  To¬ 
ronto  cohort  contained  70.6%  biopsies  from  the  pre-pouch 
ileum  in  subjects  with  previous  ileo-anal  pouch  surgery;  all 
remaining  samples  were  from  the  colon  and  terminal  ileum, 
with  73.0%,  18.1%,  and  87.0%  from  the  colon  in  the  three 
cohorts,  respectively.  We  excluded  all  subjects  that  had 
taken  antibiotics  within  one  month  prior  to  sampling.  We 
obtained  genotyping  with  Illumina  Immunochip  assays  [18] 
and  16S  rRNA  gene  sequences  as  described  previously  [19] 
(SNP  prevalence  by  cohort  in  Additional  file  2).  We  rarefied 


bacterial  microbiome  samples  to  an  even  sequencing  depth 
of  2,000  sequences  per  sample  to  control  for  differential  se¬ 
quencing  effort  across  cohorts.  This  rarefaction  depth  allows 
us  to  observe  taxa  with  relative  abundance  as  low  as  0.15% 
with  95%  confidence  in  each  sample  (binomial  distribution 
with  2,000  trials  and  probability  0.0015).  We  report  a 
pathway-level  analysis  of  complex  functional  associations 
between  host  genetics  and  overall  microbiome  composition, 
as  well  as  a  targeted  analysis  of  the  association  of  NOD2 
with  specific  bacterial  taxa. 

Methods 

Ethics  and  consent 

This  study  was  approved  by  the  Partners  Human  Re¬ 
search  Committee,  116  Huntington  Avenue,  Boston, 
MA,  USA.  Patients  gave  informed  consent  to  participate 
in  the  study.  This  study  conformed  to  the  Helsinki  Dec¬ 
laration  and  to  local  legislation. 

Data  collection  and  generation 

We  genotyped  subjects  using  the  Immunochip  platform 
as  described  previously  [18],  excluding  polymorphisms 
with  minor  allele  frequency  of  0.1  or  below  from  subse¬ 
quent  testing.  16S  rRNA  genes  were  extracted  and  ampli¬ 
fied  from  intestinal  biopsies  and  sequenced  on  the 
Illumina  MiSeq  platform  using  published  methods  [20]. 
These  procedures  include  extraction  using  the  QIAamp 
DNA  Stool  Mini  Kit  (Qiagen,  Inc.,  Valencia,  CA,  USA)  ac¬ 
cording  to  the  manufacturer  s  instructions  with  minor  al¬ 
terations  described  in  prior  work  [20],  followed  by 
amplification  using  the  16S  variable  region  4  forward  pri¬ 
mer  GTGCCAGCMGCCGCGGTAA  and  reverse  primer 
GGACTACHVGGGTWTCTAAT,  followed  by  barcoded 
multiplexing  and  sequencing.  Only  one  biopsy  was  used 
per  subject;  when  multiple  biopsies  were  available  we  se¬ 
lected  the  non-inflamed  biopsy  first. 

Data  processing 

We  extracted  risk  allele  counts  for  163  published  genetic 
risk  loci  for  CD,  UC,  and  IBD  [1].  When  combining  data 
from  separate  Immunochip  runs  we  tested  for  strand  in¬ 
versions  by  linkage  disequilibrium  with  neighboring  vari¬ 
ants  using  plink  [21].  Microbial  operational  taxonomic 
units  (OTUs)  and  their  taxonomic  assignments  were 
obtained  using  default  settings  in  QIIME  version  1.8  [22] 
by  reference-mapping  at  97%  similarity  against  representa¬ 
tive  sequences  of  97%  OTU  in  Greengenes  (taxa  version 
4feb2011;  metagenome  version  12_10)  [23].  We  used  all 
default  settings  in  QIIME  1.8  for  OTU  mapping,  and  we 
used  the  pre-assigned  taxonomy  for  the  Greengenes  OTU 
representative  sequences.  Samples  were  rarefied  to  an 
even  sequence  depth  of  2,000  sequences  per  sample  to 
control  for  varied  sequencing  depth.  Taxa  were  collapsed 
into  clusters  with  >0.95  Pearsons  correlation  to  remove 
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redundant  signals  in  the  data  (Additional  file  3).  Principal 
coordinates  of  between- subject  distances  were  obtained 
from  UniFrac  [24]  distances  of  OTUs  and  Jensen-Shannon 
and  Bray-Curtis  distances  of  KEGG  (Kyoto  Encyclopedia  of 
Genes  and  Genomes)  module  and  pathway  distributions. 
Bacterial  taxa  were  arcsine-square-root  transformed  and 
bacterial  functions  were  power-transformed  ('car'  package 
[25])  to  stabilize  variance  and  reduce  heteroscedasticity. 

Statistical  analysis 

Linear  association  tests  were  performed  only  within  those 
taxa  with  nonzero  abundance  in  at  least  75%  of  subjects. 
Taxa  below  that  threshold  were  subjected  to  logistic  regres¬ 
sion  for  presence/absence;  no  such  taxa  revealed  significant 
associations  after  correcting  for  multiple  comparisons.  To 
ensure  robustness  of  tests  to  outliers,  subjects  with  taxon 
or  functional  module  relative  abundance  more  than  three 
times  the  interquartile  range  from  the  mean  were  excluded 
for  tests  of  that  feature.  Power  analysis  was  performed 
using  the  linear  effect  size  that  we  observed  for  Enterobac- 
teriaceae  when  regressing  on  NOD2  risk  allele  count  and 
controlling  linearly  for  clinical  covariates  (f2  =  R2/(l  -  R2)  = 
0.013;  R  is  the  coefficient  of  multiple  correlation).  Assum¬ 
ing  the  need  to  correct  for  testing  of  all  163  IBD  loci  against 
22  dominant  taxa  (3,586  tests;  adjusted  significance  thresh¬ 
old  =  1.39  x  10-5),  we  would  need  at  least  3,729  samples  to 
power  the  full  analysis  (R  pwr  package  power  calculation 
for  a  linear  model  with  19  numerator  degrees  of  freedom). 
Discrete  qualitative  covariates  were  re-coded  with  dichot¬ 
omous  dummy  variables  representing  each  class  prior  to 
testing.  Association  of  clinical  covariates  was  performed 
jointly  by  multiple  linear  regression.  To  overcome  redun¬ 
dancy  between  clinical  covariates,  we  clustered  clinical  co¬ 
variates  based  on  their  pairwise  maximum  uncertainty 
coefficients  [26],  an  information- theoretic  measure  of  their 
degrees  of  shared  information.  Continuous-valued  covari¬ 
ates  were  discretized  prior  to  information-theoretic  cluster¬ 
ing.  Complete-linkage  clustering  was  performed  to  identify 
groups  of  covariates  in  which  each  covariate  contained  at 
least  50%  of  the  information  contained  in  each  other  covar¬ 
iate.  Network  plots  were  created  using  the  igraph  [27]  pack¬ 
age.  For  the  network  plot  of  non-genetic  host  factors  and 
NOD2 ,  width  of  edges  was  determined  by  the  ratio  of  a 
given  covariate  s  linear  regression  coefficient  to  the  mean  of 
the  regressed  taxons  relative  abundance.  Enrichment  of  a 
host  functional  pathway  for  association  with  bacterial  taxa 
was  assessed  by  comparing  the  observed  rank  product  of 
all  host  gene-bacterial  taxon  association  tests  for  all  genes 
in  the  pathway  with  the  distribution  of  rank  products  of 
100,000  size-matched  pathways  randomly  generated  from 
the  null  Immunochip  variants  described  above.  Prior  to 
testing,  REACTOME  pathways  with  >75%  overlap  were 
binned  and  the  largest  constituent  pathway  chosen  as  a  rep¬ 
resentative  for  subsequent  tests. 


Results  and  discussion 

Genotype-microbiome  associations  conserved  across 
independent  cohorts 

Our  genotype-microbiome  association  testing  methodology 
included  steps  to  overcome  power  limitations  given  the 
very  large  number  of  potential  comparisons,  to  incorporate 
published  knowledge  of  signaling  and  metabolic  pathways 
in  the  host  genome,  and  to  control  for  multiple  environ¬ 
mental  host  factors  affecting  gut  microbiome  composition 
(Figure  1).  In  a  targeted  analysis  of  NOD2 ,  we  also 
accounted  for  multiple  causal  variants  in  the  genetic  locus 
(Supplementary  methods  in  Additional  file  1).  After  data 
preprocessing  and  normalization  we  tested  linearly  for  as¬ 
sociation  of  risk  allele  count  in  each  SNP  with  the  relative 
abundance  of  each  bacterial  taxon.  In  all  tests,  we  con¬ 
trolled  for  recent  antibiotic  usage  (<1  month),  recent  im¬ 
munosuppressant  usage  (<1  month),  biopsy  inflammation 
status  based  on  pathology,  age,  gender,  biopsy  location, 
CD/UC  diagnosis,  disease  location,  elapsed  time  since  diag¬ 
nosis,  cohort  membership,  and  the  first  three  principal 
components  of  genotype  variation  (Figure  1;  Figure  S3  in 
Additional  file  1).  Although  the  IBD-related  SNPs  extracted 
from  the  Immunochip  data  were  identified  previously  in 
European  populations,  we  do  not  expect  this  to  limit  our 
findings  because  our  cohorts  were  mostly  of  European  des¬ 
cent.  We  validated  our  linear  testing  methodology  by  com¬ 
paring  associations  in  the  Boston  cohort  with  those  in  the 
other  two  cohorts,  in  addition  to  performing  other  sensitiv¬ 
ity  analyses  (Supplementary  methods  in  Additional  file  1). 

We  tested  163  recently  IBD-associated  SNPs  for  associ¬ 
ation  with  bacterial  taxonomic  profiles;  154  remained  after 
removing  those  with  low  minor  allele  frequencies  or  with 
low  call  rates  in  our  cohorts  (Supplementary  methods  in 
Additional  file  1).  Many  of  these  SNPs  have  unknown 
mechanisms  and  are  likely  only  representative  of  a  signal 
within  the  surrounding  genomic  locus.  Therefore,  when  a 
single  gene  was  associated  previously  with  a  SNP,  we  refer 
to  that  SNP  by  the  gene  name  for  convenience.  Due  to 
limited  statistical  power  we  were  unable  to  perform  a 
full  analysis  of  all  possible  SNP-taxon  associations 
(Supplementary  methods  in  Additional  file  1).  However, 
we  were  able  to  test  for  the  robustness  of  microbiome- 
wide  associations  with  a  given  SNP  by  comparing  the 
directionality  of  the  SNP-taxon  coefficients  between  inde¬ 
pendent  cohorts.  For  this  test  we  included  only  those 
SNP-taxon  associations  for  a  given  SNP  that  were  nomin¬ 
ally  significant  ( P  <  0.05)  in  at  least  one  of  the  studies  be¬ 
ing  compared.  We  then  obtained  Matthews  correlation 
coefficient  (MCC;  also  known  as  the  phi  coefficient)  of  the 
signs  (positive  or  negative)  of  the  SNP-taxon  coefficients 
in  one  study  with  the  signs  of  corresponding  SNP-taxon 
coefficients  in  the  second  study,  and  corrected  these 
microbiome-wide  tests  for  multiple  comparisons  (one 
MCC  test  per  gene)  at  a  false  discovery  rate  (FDR)  of  0.25. 
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Environment 

Figure  1  Schematic  of  multiomics  genotype-microbiome  association  testing  methodology.  Host  genome-microbiome  association  testing  involves 
potentially  thousands  or  millions  of  genetic  polymorphisms  and  hundreds  or  thousands  of  bacterial  taxa  and  genes.  Full  feature-by-feature  association  testing 
is  likely  to  be  underpowered  in  all  but  the  largest  cohorts  or  meta-analyses;  therefore,  our  methodology  includes  careful  feature  selection  from  both  data 
types.  Raw  genetic  polymorphisms  were  derived  from  Immunochip  data  and  filtered  by  known  IBD  associations  from  a  large-cohort  GWAS  study  [1], 
Microbiome  seguences  were  binned  by  lineage  at  all  taxonomic  levels.  After  data  normalization  and  filtering  (see  Methods),  a  simple  linear  test  was 
performed  for  association  between  minor  allele  count  and  bacterial  taxon  relative  abundance  while  controlling  for  clinical  covariates.  QTL,  quantitative 
trait  loci. 


We  chose  the  FDR  of  0.25  for  this  analysis  due  to  the  large 
number  of  tests  and  the  fact  that  we  used  the  significant 
results  mainly  to  test  for  enrichment  of  certain  host  path¬ 
ways,  rather  than  to  focus  on  individual  associations.  We 
note  that  it  is  important  to  compare  only  the  directionality 
of  SNP-taxon  effects  between  studies,  and  not  the  magni¬ 
tudes  of  the  SNP-taxon  regression  coefficients,  because 
the  magnitude  of  a  coefficient  is  closely  linked  to  the 
mean  relative  abundance  of  a  given  taxon.  To  decrease 
bias  toward  a  particular  taxonomic  level  of  association 
[28],  we  performed  these  tests  using  bacterial  taxa  at  all 
taxonomic  levels  from  phylum  to  genus,  collapsing  those 
with  redundant  signals.  In  contrast  to  using  OTU  clusters, 
binning  by  taxonomy  allows  inherent  flexibility  in  the  level 
of  16S  gene  sequence  identity  within  each  bin  in  different 
lineages. 

A  number  of  host  genes,  some  with  known  involvement 
in  microbial  handling,  and  others  with  unknown  function, 
demonstrated  reproducible  effects  on  the  taxonomic  struc¬ 
ture  of  the  microbiome  across  two  or  more  cohorts.  Effect 
size  and  directionality  of  genotype-microbiome  associations 
were  highly  reproducible  between  cohorts  in  the  case  of 
NOD2  and  48  other  host  genes  (FDR  <0.25;  Additional 


file  4).  NOD2  had  one  of  the  most  highly  reproducible  sets 
of  associations  with  bacterial  taxa  (MCC  =  0.75,  FDR  =  2.6  x 
10'4  comparing  Boston  versus  Toronto  cohorts;  MCC  = 
0.85,  FDR  =  7.7  x  10'4  Boston  versus  Netherlands;  Figure  2a). 
Other  genes  with  significantly  conserved  directionality  of  ef¬ 
fects  on  bacterial  taxa  between  at  least  one  pair  of  studies 
included  tumor  necrosis  factor  (ligand)  superfamily,  mem- 
ber  15  ( TNFSF1S;  MCC  =  0.87,  FDR  =  9.5  x  10'3,  Boston 
versus  Netherlands)  and  subunit  beta  of  interleukin  12 
{IL12B;  MCC  =  0.74,  FDR  =  1.5  x  10'3,  Boston  versus 
Netherlands). 

NOD2  variants  were  the  first  genetic  associations  iden¬ 
tified  in  CD,  and  they  remain  some  of  the  strongest  risk 
factors.  NOD2-driven  murine  dysbiosis  causes  inflamma¬ 
tion  even  when  the  dysbiotic  microbiota  are  transplanted 
into  a  wild-type  mouse  [13].  Expression  of  TNFSF1S ,  a 
member  of  the  tumor  necrosis  factor  ligand  superfamily, 
causes  proinflammatory  cytokine  production,  and  is  spe¬ 
cifically  expressed  more  highly  in  the  gut  in  IBD  patients 
compared  with  healthy  controls.  Interestingly,  a  receptor 
for  a  member  of  the  same  family,  TNFSF14,  enhances 
immune  response  to  pathogenic  bacteria  via  signal  trans¬ 
ducer  and  activator  of  transcription  3  ( STAT3 )  activation 
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Figure  2  NOD2  fine  mapping  reveals  association  with  taxonomic  and  metabolic  dysbiosis,  (a)  Scatterplot  of  N0D2-bacterial  taxon  regression 
coefficients  in  one  study  versus  the  corresponding  regression  coefficients  in  another  study.  We  included  only  those  taxa  with  a  nominally  significant  (P<  0.05) 
association  in  a  least  one  of  the  cohorts  being  compared,  (b)  Comparison  of  residual  distributions  of  Enterobacteriaceae  with  and  without  incorporating  the 
six  independent  known  causal  NOD2  variants;  considering  variant  rs5743293,  only  6.3%  of  subjects  have  one  or  more  risk  alleles;  aggregating  risk  allele  counts 
across  the  six  variants  increases  this  to  21.8%,  and  reveals  much  stronger  associations  with  the  microbiome.  The  strip  charts  and  violin  plots  show  the 
distribution  of  standardized  residual  relative  abundance  of  Enterobacteriaceae  versus  NOD2  risk  allele  dosage  after  data  transformation  and  regression  on 
clinical  covariates.  Violin  plots  show  the  conditional  density  of  residual  relative  abundance  within  each  dosage  level,  (c)  Relative  positions  of  six  NOD2  variants 
in  NOD2  exons  [29], 


in  a  mouse  model  of  Escherichia  coli  infection.  TNFSF14 
and  TNFSF15  are  known  to  share  an  alternative  recep¬ 
tor,  indicating  potential  functional  overlap.  IL12B  forms 
part  of  the  interleukin-23  complex,  involved  in  microbial 
defense  mechanisms  through  the  IL23-Thl7  pathway. 

Immunity-related  host  functional  pathways  linked  to 
microbiome  profile 

We  hypothesized  that  host  functional  pathways  containing 
multiple  risk  variants  related  to  microbial  handling  and  in¬ 
nate  immune  response  would  be  associated  with  micro¬ 
biome  features.  To  test  this  hypothesis  we  performed  a 
functional  enrichment  analysis  on  the  49  genes  identified 
to  have  conserved  microbiome  associations  across  co¬ 
horts.  We  found  these  genes  to  be  significantly  enriched 
for  regulation  of  innate  immune  response  (FDR  =  2.31  x 
10'6,  hypergeometric  enrichment  test),  inflammatory  re¬ 
sponse  (FDR  =  7.43  x  10'6,  hypergeometric  enrichment 
test),  and  participation  in  the  JAK-STAT  cascade  (FDR  = 
2.04  x  10"4,  hypergeometric  enrichment  test)  (Figures  3 
and  4;  Additional  file  5).  A  gene-gene  interaction  network 
analysis  also  implicated  STAT3,  interleukin- 12  subunit 


alpha  (. IL12A ),  and  interleukin-23  subunit  alpha  ( IL23A )  in 
the  network  of  associated  genes. 

STAT3  and  TNFSF1S  are  both  implicated  in  IL23  sig¬ 
naling.  STAT3  works  in  concert  with  Janus  Kinase  2 
( JAK2 )  in  the  JAK-STAT  pathway  to  drive  immune  re¬ 
sponse  to  pathogenic  infection.  STAT3  also  regulates  T 
helper  17  (Thl7)  cell  differentiation  by  binding  IL23  re¬ 
ceptor  (IL23R;  risk  variant  for  IBD:  rsl  1209026)  and 
RAR-related  orphan  receptor  C  (RORC;  rs4845604), 
both  of  which  are  located  in  IBD  risk  loci.  STAT3  defects 
have  also  been  implicated  recently  in  skin  microbial  im¬ 
balance  and  impaired  host  defense.  TNFSF15,  a  member 
of  the  tumor  necrosis  factor  ligand  superfamily,  is  a  cost¬ 
imulator  of  T  cells,  and  is  specifically  expressed  more 
highly  in  the  gut  in  IBD  patients  compared  with  healthy 
controls  [31,32]. 

Fine  mapping  of  NOD2  locus  reveals  association  with 
Enterobacteriaceae 

Based  on  previous  results  [9-13]  and  on  the  strong  linkage 
between  NOD2  and  microbial  handling  [9,12,13],  we  con¬ 
tinued  with  a  targeted  analysis  of  NOD2  association  with 
specific  microbial  taxa  and  functions  (Additional  file  6). 
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Figure  3  Host  genes  with  reproducible  microbiome  associations  across  cohorts.  Network  analysis  of  host  signaling  and  metabolic  pathways 
enriched  for  association  with  microbial  taxa  (FDR  <0.25,  Matthew's  correlation  test).  The  visualization  of  gene-gene  interaction  network  for  the 
subset  of  49  genes  with  significantly  conserved  directionalities  of  association  with  the  microbiome  is  supported  by  several  types  of  gene-gene 
connection  [30],  This  enrichment  analysis  identified  enriched  functional  networks  in  innate  immune  response,  inflammatory  response,  and  the 
JAK-STAT  pathway,  all  of  which  play  roles  in  immune  response  to  pathogen  infection  [1], 


For  all  NOD2  testing  we  aggregated  risk  allele  dosage 
across  six  known  causal  variants:  rs2066844  (R702W), 
rs2066845  (G908R),  rs5743277,  rs5743293  (fsl007insC), 
rsl04895431  and  rsl04895467  [29].  This  novel  method¬ 
ology  was  crucial  as  individual  variants  contained  only 
part  of  the  signal  (Figure  2b, c).  NOD2  was  associated  with 
the  first  principal  axis  of  taxonomic  (via  weighted  UniFrac 
distances)  microbiome  variation  (FDR  <0.05,  controlling 
for  three  principal  axes  tested),  linking  NOD 2  to  shifts  in 
overall  microbiome  taxonomic  composition.  We  identified 
increased  Enterobacteriaceae  in  subjects  with  higher 
NOD2  risk  allele  dosage  (FDR  =  0.11,  controlling  for  mul¬ 
tiple  taxa  tested;  Figure  2b;  Additional  file  7).  An  increase 
in  Gammaproteobacteria  is  a  known  component  of  IBD 
dysbiosis  and  is  associated  with  inflammation  in  mice  and 
humans  [4,33]  and  with  increased  epithelial  penetration  in 
CD  and  UC  [34].  NOD2  also  had  one  of  the  most  strongly 
reproducible  associations  with  microbiome  composition 
when  comparing  cohorts.  Although  NOD2  is  only  associ¬ 
ated  with  increased  risk  of  CD,  A/OD2-microbiome  associ¬ 
ations  we  observed  were  generally  independent  of  CD/UC 
diagnosis,  with  high  overlap  between  CD  and  UC  when 
tested  separately  (taxa:  Spearmans  rho  =  0.57,  P  =  6.5  x  10'3; 
Figure  S4  in  Additional  file  1).  This  implies  that  the  associ¬ 
ation  may  be  disease-independent,  and  may  play  a  role  in 
pathogenesis  only  in  subjects  with  other  risk  factors.  For 
example,  NOD2  SNP  rs5743293  is  associated  with 


complications  in  ileo-anal  pouch  patients  despite  their  ori¬ 
ginal  diagnosis  being  UC  [35-38]. 

A  complex  web  of  genotype-environment-microbiome 
interactions  in  IBD 

Our  findings  indicate  that  host  genetics  are  part  of  a  com¬ 
plex  web  of  host-associated  factors  influencing  micro¬ 
biome  composition.  We  performed  a  meta-analysis  of 
interactions  between  clinical  host  factors  and  bacterial 
taxa  using  the  above  474  subjects  and  an  additional  55 
subjects  who  had  recently  taken  antibiotics.  This  analysis 
included  NOD2  as  one  of  the  host  factors.  We  identified 
an  additional  99  significant  associations  of  non-genetic 
factors  with  relative  abundance  of  specific  bacterial  taxa, 
largely  in  agreement  with  previous  analyses  of  the  IBD 
microbial  environment  [4].  To  visualize  the  overlap  of  in¬ 
teractions  between  various  host  factors  and  microbial 
taxa,  we  constructed  a  network  of  associations  between 
bacterial  taxa  and  observed  host  and  environmental  fac¬ 
tors  (Figure  5;  Additional  file  8;  Supplementary  methods 
in  Additional  file  1). 

This  analysis  revealed  a  web  of  complex  overlapping  link¬ 
ages  between  numerous  host  factors  and  bacterial  taxa.  For 
example,  recent  antibiotic  usage  is  associated  with  system¬ 
atic  shifts  in  many  major  taxonomic  groups  (Figure  6;  FDR 
<0.05);  immunosuppressants  are  associated  with  decreased 
Firmicutes,  and  Ruminococcaceae.  Biopsy  location  and 
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Figure  4  Top  gene-bacteria  associations.  Beeswarm  plots  of  the  relative  abundance  of  six  bacteria  stratified  by  the  number  of  risk  alleles  present  in 
SNPs  in  the  given  genes.  The  associations  shown  are  the  six  most  significant  associations  between  bacteria  and  genes  in  the  subset  of  genes  with 
conserved  bacterial  associations  across  cohorts  and  belonging  to  the  JAK-STAT  pathway  or  the  innate  immune  pathway  response  as  shown  in  Figure  3. 
Relative  abundances  shown  are  transformed  with  the  arcsine-square  root  transformation  to  stabilize  variance  and  to  make  distributions  more  normal. 


cohort  membership  had  similarly  broad  effects;  age,  gender, 
and  disease  phenotype  had  measurable,  although  less  broad, 
effects;  genotype,  as  represented  by  the  NOD2  subtype,  had 
a  modest  effect  in  relation  to  other  factors.  Inflammatory 
status  of  the  biopsied  tissue  was  associated  with  increased 
relative  abundance  of  unclassified  members  of  Lactobacillus , 
and  with  decreased  relative  abundance  of  Bacteroides  unifor- 
mis  (Figure  S5  in  Additional  file  1).  This  analysis  demon¬ 
strates  the  comprehensive  and  intermingled  effects  of 
treatment  history,  gastrointestinal  biogeography,  and  other 
host  and  environmental  factors  on  gut  microbiome  profile 
and  makes  clear  the  need  to  account  for  host  factors  when 
linking  host  genotype  to  microbial  composition  in  a  pheno- 
typically  heterogeneous  population.  We  confirmed  that  host 
genetics  as  a  whole  do  have  a  significant  effect  on  micro¬ 
biome  profile  by  correlating  overall  between-subject  genetic 
distance  (Manhattan  distance)  with  overall  between-subject 
microbiome  distance  (unweighted  UniFrac  distance) 
( P  <  5.0  x  10'10;  Figure  S6  in  Additional  file  1),  but  that  it  is 
only  a  minor  contributor  in  the  context  of  other  sources  of 
variation.  A  recent  study  of  treatment-naive  pediatric  pa¬ 
tients  with  CD  identified  consistent  microbiome  shifts  in  pa¬ 
tients  with  recent  antibiotic  exposure  toward  the  disease- 
related  state  [20].  That  study  exemplified  the  need  to  control 


for  the  potentially  confounding  effects  of  antibiotics  when 
attempting  to  identify  bacterial  profiles  associated  with  dis¬ 
ease.  Based  on  several  studies  linking  short-  and  long-term 
dietary  exposure  to  microbiome  profile,  it  is  also  likely  to  be 
useful  to  include  food  intake  diaries  or  dietary  recall  ques¬ 
tionnaires  in  future  genotype-microbiome  research  [39,40]. 

Antibiotics  contribute  to  IBD  dysbiosis  independent  of 
NOD2  effects 

The  fact  that  host  genetics  are  a  minor  contributor  to 
overall  microbiome  composition  relative  to  environmental 
factors  does  not  exclude  the  possibility  that  genotype- 
microbiome  interactions  play  an  important  role  in  the  eti¬ 
ology  of  IBD;  it  is  possible  that  the  important  variations 
are  in  a  particular  set  of  taxa  or  a  particular  set  of  func¬ 
tions  (for  example,  resistance  to  oxidative  stress)  that 
make  up  a  minor  portion  of  the  overall  microbiome,  while 
there  are  other  taxa  not  closely  related  to  IBD  but  highly 
influenced  by  the  hosts  environmental  exposures  (for  ex¬ 
ample,  dietary  exposures).  Such  a  subset  of  taxa  related  to 
dysbiosis  were  reported  in  a  recent  comparison  between 
treatment-naive  patients  with  Crohns  disease  and  healthy 
controls  [20],  and  the  ratio  of  the  disease-associated  taxa 
to  the  health-associated  taxa  was  referred  to  as  the 
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Figure  5  Host  factors  associated  with  the  IBD  microbiome.  A  complex  network  of  host  factors  associated  with  the  IBD  microbiome 
(all  associations  FDR  <0.05);  only  taxa  with  at  least  four  significant  associations  are  included  in  the  network;  green  and  purple  edges  indicate 
positive  and  negative  associations,  respectively;  the  width  of  an  edge  indicates  the  strength  of  the  association.  The  effects  of  these  factors  on 
individual  taxa  are  highly  overlapping.  The  analysis  identified  covariates  representing  each  type  of  host  factor,  consistent  with  previous  results  [4], 
Biopsy  location  and  medication  history  had  the  strongest  and  most  comprehensive  effects  on  microbiome  profile;  the  effect  of  NOD2  was  moderate 
in  comparison.  Cohort  membership  (not  shown)  also  affected  microbiome  profile.  These  results  demonstrate  the  need  for  study  designs  and  analysis 
methodologies  that  control  carefully  for  numerous  host  genetic  and  environmental  factors  when  performing  microbiome-based  biomarker  discovery. 
Abx,  antibiotics  within  1  month;  Imm,  immunosuppressants  within  1  month;  L2,  no  ileal  involvement;  PPI,  biopsy  from  pre-pouch  ileum. 


microbial  dysbiosis  index  (MDI).  This  recent  study  identi¬ 
fied  an  increase  in  the  MDI  scores  of  patients  who  had  re¬ 
cently  received  antibiotics,  indicating  that  antibiotics  tend 
to  shift  patient  microbiomes  further  into  the  realm  of 
IBD -related  dysbiosis.  We  used  the  same  taxa  as  reported 


Figure  6  Association  of  IBD-related  dysbiosis  and  recent  antibiotics 
usage.  A  beeswarm  plot  [41]  of  the  previously  published  microbial 
dysbiosis  index  [20]  (MDI)  stratified  by  recent  antibiotics  usage  by  patients. 
The  test  for  this  association  between  MDI  and  antibiotics  (P  =  0.039,  linear 
regression  r-test)  included  NOD2  risk  allele  count  to  control  for  the  effects 
of  NOD2  genetics  on  the  microbiome. 


previously  to  calculate  an  MDI  score  for  each  patient  in 
our  analysis.  In  our  cohorts  we  confirmed  the  published 
finding  that,  when  controlling  for  NOD2  effects  on  micro¬ 
biome  structure,  MDI  score  tended  to  be  higher  in  pa¬ 
tients  with  recent  usage  (within  less  than  one  month)  of 
antibiotics  (P  =  0.039,  £-test  of  linear  regression  coefficient) 
(Figure  6).  This  finding,  together  with  previously  pub¬ 
lished  findings  regarding  the  effects  of  antibiotics  on  the 
IBD  microbiome  suggest  that  antibiotics  and  duration  of 
disease  are  additional  risk  factors  for  IBD-related 
dysbiosis. 

Conclusions 

Taken  together,  our  findings  indicate  a  complex  set  of 
associations  between  the  mucosal-adherent  microbiome 
and  genetic  impairment  of  several  host  immune  path¬ 
ways.  Although  we  have  been  living  and  evolving  with 
our  microbial  symbionts  throughout  human  evolution, 
we  have  only  been  aware  of  their  existence  for  a  few 
centuries,  and  the  genetic  and  functional  diversity  of  our 
so-called  second  genome'  has  only  become  apparent  in 
the  last  few  decades.  Also  in  recent  decades  incidence  of 
IBDs  and  other  autoimmune  and  autoinflammatory  dis¬ 
eases  has  increased  dramatically  [42],  and  a  rapidly 
growing  set  of  these  diseases  has  been  linked  to  shifts 
both  in  taxonomic  carriage  and  functional  potential  of 
host-associated  microbial  communities.  Although  our 
data  are  cross-sectional  and  therefore  cannot  define 
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causality,  our  analyses  demonstrate  complex  host  genetic 
associations  with  taxonomic  and  metabolic  dysbiosis  in 
humans.  These  include  implications  of  microbiome-wide 
associations  with  TNFSF1S ,  IL12B ,  and  with  innate 
immune  response,  inflammatory  response,  and  the  JAI<- 
STAT  pathway,  as  well  as  M)D2-related  increases  in  En- 
terobacteriaceae  relative  abundance.  Future  studies  may 
be  warranted  to  account  for  the  effects  of  copy  number 
variation,  pleiotropic  genes  and  epigenetic  modifications. 
It  is  also  possible  that  certain  genotype-microbiome  as¬ 
sociations  observed  in  IBD  patients  may  be  disease- 
independent  and  may  be  relevant  to  healthy  individuals 
and  individuals  with  other  diseases.  The  methods  we 
employed  were  validated  on  independent  cohorts  and 
make  possible  well-powered  false-positive-controlled 
testing  of  microbiome-wide  host  genetic  associations. 
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